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ABSTRACT 
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described. These algorithms circumvent iterative solutions by using 
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of ability points used to condition the information functions. The 
algorithms were implemented in a microcomputer software package and 
tested by generating s. x forms of an American College Testing Program 
(ACT) mathematics test, each fit to an existing target test, 
including content-designated item subsets. Six forms of 40 items each 
were generated by ITEMSEL using the 600 items in the mathematics pool 
and the Mathematics Test Form 26A target information values 
conditional on K=31 quadrature points of theta. The results indicate 
that the algorithms provide reliable fit to the target in terms of 
item parameters, test information functions, and expected score 
distributions. A discussion of the application is included. Four 
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ABSTRACT 

This paper describes the derivation of several item selection algorithms 
for use in fitting test items to target information functions. These 
algorithms circumvent iterative solutions by using the criteria of moving 
averages of the distance to a target information function and simultaneously 
considering a., entire range of ability points used to condition the 
information functions. The algorithms were implemented in a microcomputer 
software package and tested by generating six forms of an ACT math test, each 
fit to an existing target test, including content-designated item subsets. 
The results indicate that the algorithms provide reliable fit to the target in 
terms of item parameters, test information functions and expected score 
distributions, A discussion of the application is included. 
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Introduction 

Advances in computer technology have generated a growing interest in test 
construction applications which take advantage of that technology. One such 
area of interest has been the use of computers to create parallel tests. 

In Item Response Theory (IRT), parallelism among tests, test forms or 
subtests can in part be determined by what are termed item ^nd test 
information functions among other criteria, IRT uses this concept of 
information, conditional upon a latent ability, 6, to determine measurement 
precision. Contrasted with classical test theory, which derives a single 
estimate of measurement accuracy via reliability and the standard error of 
measurement, IRT uses the inverse of the square root of the information 
function about the 3s to denote measurement accuracy across an entire latent 
ability metric. 

This information is defined at the item level by 

P. '(8, ) 2 
J J k J k 

where P. (8^) is the probability of a correct response to item j at some 
ability level, 8 , Q.(8 ) = 1 - P. (8 ), and P. (8, ) is the first derivative 

K J K J K J K 

of P.(8 ) with respect to 6 . Furthermore, the item information, 1.(8.), is 
j k k j k 

additive which allows us to derive the information for an entire test or 
subtest as 



J 

T(6. ) = I 1.(6. ). (2) 
j = l 



It must be noted, however that T^^) ^ s mere ^y t ' 1G test information 
function conditional upon some single level of ability, 9^, Because 
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the 0^ abilities are in reality distributed continuously on R or the real 
number line, {-«, we must extend our concern beyond some k ability 

point to an entire test information curve . The shape of and area under such 
test information curves can then be used to determine a weak form of 
paralLeLism among tests (Lord, 1977, Samejima, 1977). That is, tests (forms 
or subtests having similar content and measuring the same Latent trait) with 
identical test information curves may be considered essentially to be 
parallel. Therefore, if we can create different test forms with approximately 
the same test information curves (and similar content), then our forms should 
be reasonably parallel. 

However, practical solutions to the problem of actually generating 
parallel tests via test information curves have demonstrated only limited 
success. Algorithms suggested by Theunissen (1985) and van der Linden and 
Boekkooi-Timminga (1989), which employ zero-one, linear programming to 
maximize test information, tend to require large amounts of computing time and 
remain limited for large scale applications. Although parameter restrictions 
and heuristics can be applied to the zero-one problem (e.g. Adema, 1988) a 
trade-off of computer time versus accuracy tends to result. 

Other techniques based upon more heuristic approaches (sort and search 
rule-based algorithms) more dramatically reduce computational loads but run 
the risk of operating with limited accuracy. For example, Ackerman (1989) was 
able to demonstrate the implementation of a strictly heuristic technique which 
prioritized item information based upon distance from a target test 
information curve. Under Ackerman' s approach, pooled items were presorted a r 
various ability Levels by descending information and those items which 
contributed the most information at priority points on the test information 
curve were assigned to test forms. Unf ortunateLy , Ackerman* s technique tended 
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to always choose the most discriminating items and usuaLly overestimated the 
target test information curves (i.e. produced more informative tests than 
targeted) . 

What appears necessary, therefore, is a set of techniques which effect a 
comprcmise between computational loads and purely heuristic approaches. This 
paper focuses upon that specific problem — to determine a set of general 
heuristics and algorithms which can be used to select J items from a pool of M 
items (J<M) which minimize the difference between a target information curve 
and the actual information curve formed by the J items, at some K points along 
an ability distribution. 

Derivation of t he Item Selec tion Algorithm 

We begin by defining as some amount of targeted test information, 
conditional upon 0^, (k = 1,..., K quadrature points). This target 
information is assumed to represent the standard form of a test who^e 
properties we wish to match. We also define T. as the conditional 
information with respect to the j selected item (*j - 1,..., J, k = 1,,.,, 
K) such that 



j * 

T. = T* = I I .(8. * . (3) 

Note that by prior definition of the test information, equation *2), T. is 

J 

merely an inrremental sum of the item information, 1.(8, )« To further clarify 

J k 

equation (3), it is only for conceptual convenience that we distinguish 

between T^ as the approximation of the i.em information functions being 

incrementally summed and T as the finished approximation ;f the information 

function,, conditional upon 0, (i.e. T, = T. where j = J). 

k k jk 
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As implied earlier, the ability distribution of 8s used to condition the 
test information curve is generally considered to span +»} ; however, in 

practice, K is usually kept to some small number of quadrature points (e.g. 
K < = 31) on the interval {-3,0, +3.Q}) minimally adequate for sampling the 
cumulative information function (GIF, or cumulative density of the information 
function conditional on 8) at eqaal partitions. 

Next, we need to consider the distances between the target-, function, T^, 
and the information function under construction, T^. That distance is given 
by 



d ■ 0 for T < T. 

a ■ IT - T I / * J * . (4) 

\ \\ V j d, = T k - T jk for T k > I jk ' 

which denotes the absolute difference (distance) between the target function, 
T^, and the approximation of the test information function, T \ * 

We can now adjust d^ to a partitioned distance corresponding, ideally, to 
smooth growth in T. , given 8 , as 

d k 

6 k = J - j ♦ I ' j = J ' <5) 

This partitioning of the information function at somr ^0*1*1, k, assumes 
Lhat 6^ is the optimal information with which to evai^te tLe next J - j ♦ 1 
items. In short, & becomes a moving average of the information selection 
criteria and is adjusted at each iteration in the selection process. 

Thore appear to be two sound reasons for using & . First, the iveraging 
process explicit in computing 5^ would appear to prevent, extreme (and 
arbitrary) growth in any one area of the curve. That is, items with maximal 
cr minimal information properties *t any k 11 ' 1 ability point will be less likely 
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to be chosen than items with less extreme information. Thus, averaging should 

produce smooth growth in T. as opposed to sporadic growth which requires 

continual and sometimes dramatic correction. Second, the dynamic nature of 

r h 

computing 6^ at each j Ln selection iteration allows for constant "fine tuning 1 ' 
along the 8^ (k = 1...K) points. In other words, error in estimating the 
target function is accounted for directly by the algorithm as part of the next 
set of distances from the target to be evaluated. 

Once 6^ is derived, we use it to create a set of relative 
weights, w , which will then be used to actually prioritize the information at 
K ability points being evaluated. The relative weights are determined by 
normalizing the 6^'s across the k quadrature points, as given by 

\ = ir — ■ (6) 
k=i K 

K 

where L u = 1,0, (In practice, 1 - u> will serve as the actual weight for 

k=l * 
reasons explained below,) 

We now proceed to use 6^ and to evaluate the M - j ♦ 1 items in the 

item pool. Let £ denote the absoLute error difference between the 

information of each m th item in the pool, evaluated at the k th ability point, 

and 6. , That is, 
k 9 



where {• might be called the error in fit of the M - j + 1 items in the 

unused item pool to 6 . It should be noted that in some sense 6 is an 

k mk 

arbitrary measure of the relative estimation error during the process of 
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selecting items. Accordingly, rank ordering the absolute differences 

between I . and 6, or squaring that difference might each be suggested as 
mk k 

plausible alternatives for arriving at £ However, only $ mk in its form as 

the absolute difference retains the scale properties of the information 

functions under evaluation. In short, any derivation of except by using 

the absolute difference would introduce additional, arbitrary and probably 

unwanted weighting of the item information along the K ability points. 

Finally, to determine the selection of the j L item, given M - j + 1 

items, we neea to create a composite selection value for each of the pooled 

items as a sum of each weighted relative error (i.e. a sum of the product 

of 1 - a) and I ), across the K ability points. Note that the use 
k mk 

of 1 - a) in place of u> merely guarantees that the weighting and the relative 
k k 

error in fit, E; . , remain in the same direction. By summing the weighted 
mk 

relative errors, we arrive at an adjusted item selection composite (of the fit 
to smooth growth in T* ) for the M - j ♦ 1 items remaining in the pool. That 
adjusted item fit selection composite, S^, is given by 

K 

S = I (1 - u. K . • <8) 
m . , k mk 

k=l 

During each iteration of the selection cycle, the item with the smallest 

value of S (i.e. least overall error, weighted by information importance) is 
m 

chosen from the M - j + 1 pooL, j is incremented and the process continues 
until j = J or until a specified degree of accuracy in approximating (k = 
K..K) is attained. Finding the item with the minimum value of S m (per 
iteration) therefore serves as the primary heuristic to be used daring the 
selection process. 
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Dealing with Item Subsets and Su btests 

One assumption implicit in the algorithm described in the prior section 

is that the target curve is comprised of fairly homogeneous items. That is, 

in building T. (see equation [3]), the item information functions are 
Jk 

essentially compared to a criterion of an average information function for 
each of J items (conditional upon the quadrature points, 9^, k s 1...K). In 
certain circumstances, this assumpcion may not be tenable. Where a target 
curve is established as a composite of subsets of items from an existing test 
or from item specifications (e.g. subtests categorized by content area and/or 
some other criteria), the categorical subsets may have different information 
distributional properties, i.e. moments of the information curves, than the 
overall target information function. 

In these situations, multiple targe .s can be used in a two-stage fitting 
procedure. Essentially, the method involves fitting each categorical or 
criterial subtarget in the first stage and then grouping the selected item 
subsets in a second stage to fit an overall targeted test information 
function. 

In the first stage of this procedure, we presume to fit a subtarget, T^, 
conditional upon 8^, comprised of J( r ) items for r=l...R subsets of items such 
: hat 

R 

T = I T , , k = 1...K (9) 
k . rk 

r=x 

Thus, the subtarget represents an allowable partitioning of the information 
function in the overall target, given 6^. In judging the fit of J( r ) items to 
?:he subtarget, T^ r » the item selection score, given by equation (8), is now 
denoted as S corresponding to the [restricted] subset of items in the 
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pool. We then independently fit a subset of items, Tj^j^, to each T rk 
subtarget (k * 1...K, r = 1...R), where 



ERLC 



T J<r)k= . l] I j(,)<V' k " U " K (10) 

After all R subsets of J( r ) items have been fitted to each subtarget, 
T r ^, we proceed to the second stage of fitting. In this stage, we use the 
subsets of the J( r ) selected items as the basic units of comparison. The 
selection algorithm proceeds as described in equation (8) but now compares the 
composite fit of the R subsets of selected J^ r j items, or T j( r j k > t0 t * ie 
overall target T^. This item subset score is given by 

J (r) 

S J(r) 82 (1 " V I . I . r j(r) ( V " \\ (U) 

where 

r * 
T k " \ T J(r)k 

\ - - a': r i 1 (12) 

with rentrictions identical to those given in equations (A) and (5), and where 

is defined and used as shown in equations (6) and (8).< Therefore, the 
subset of J( r ) items which minimizes the weighted sum of information to the 
average growth in the conditional curve being fitted is selected for r - 1...R 
cycles . 

Multiple Parallel Test Forms 

Multiple parallel test forms can be constructed in the same manner as a 
single test form. The major difference lies in the need to consider (j = 

1...J, k = 1...K, q = 1...Q), where Q is the number of test forms being fit to 



the target, T^* Furthermore, by rotating the order of the form being fit (q) 
r h 

at each j item selection iteration and controlling for duplication of item 
selection across forms, the assignment of items (based upon their information 
fit to can be essentially equalized across test forms. 

Methods 

Implementation 

All algorithms and heuristics discussed in the prior section were 
formally implemented in an IBM-compatible microcomputer-based package called 
ITEMSEL. This integrated software consists of 10 menu-driven program modules 
written in Microsoft QuickBasic 4,0 (1987) by the first author. ITEMSEL 
features EGA/VGA graphics for on-screen presentation of the selection process 
and provides a wide variety of item data base modules and file handling 
utilities which facilitate the item selection process. The software package 
also fully supports the construction of multiple test forms, the use of 
multiple subtargets for dealing with content subtests or subsets of items and 
even allows user submitted item substitutions. 

The basic process of using ITEMSEL involves user inputs of an item pool 
file, a target information file, related control inputs such as the size of J 
or J( r ) ^the number of items to be selected) and content filter values/text. 
Selected items are retained in additional files where optimization of the 
fitting process can occur or from which optional combining of item subtests 
can be accomplished. 

The programs assume a 3-pararneter IRT or Logistic model for purposes of 
computing al l information quantities. Under that model, the probability of a 
correct response to item j, conditional on ability, 8^, is given by 

-Da. (9 - b.) _1 

P. (0, ) = c . ♦ (1 - c. ){l + e J R J } (13) 

14 
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where c- is the lower asymptote parameter, a- Is the discrimination parameter 
and bj is the item difficulty* D is a constant equal to approximately 1*702 
and used for scaling 8 under the logistic model* 
Data Specifications 

An item pool consisting of 600 mathematics items from ACT testing 
programs was selected to investigate the use of the S m and S rm algorithms as 
implemented by the ITEMSEL program. 520 of the items were from 13 previously 
administered ACT Assessment Program (AAP) Mathematics tests* An additional 80 
items were drawn from the Collegiate Mathematics Placement Program (CMPP). 
Item parameters for all 600 items were derived from a three-parameter logistic 
calibration performed using LOGIST IV (Wingersky, Barton and Lord, 1982) and 
scaled to a common ability metric using equivalent groups. 

40 items which comprised the AAP Mathematics Form 26A were selected as 
the overall test target curve to remain consistent with a previously noted 
study conducted by Ackermar (1989". These 40 items were also included in the 
item pool. The Form 26A target curve was fit by evaluating the test 
information at K = 31 quadrature points on the 6 interval (-3.0, + 3.0}. The 
cumulative information function (GIF) was equally partitioned (based upon an 
integration of 1000 6 points) to locate the 31 points. That is, points were 
selected which divided the information curve into equal area partitions. 

Additionally, the six content areas which comprise Form 26A of the AAP 
Mathematics test were used to generate six corresponding subtargets. The CIF 
of each subtarget was likewise partitioned independently when generating the K 
= 31 quadrature points. These Form 26A subtest content areas contained the 
following number*: of items (for purposes of computing the information 
functions and generating subsequent subsets of items): AAR = 14 items, AAO ~ 
4 items, G = 8 items, I A = 8 items, NNS - 4 items and AT = 2 items. 

erJc 1 0 
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Item Selection Procedures 

The ITEMSEL microcomputer program was employed in a two-stag jet of 
fitting procedures meant to generate six independent forms of the AAP 
Mathematics test. In the first stage, six forms of each of the content areas 
(AAR, AAO, G, IA, NNS and AT) were initially fit to the Form 26A subtest 
information targets, ITEMSEL thus generated a total of 36 content-restricted 
item subsels. In the second stage of fitting, an "optimizer" module in the 
ITEMSEL system was used to identify and combine composite groupings of the 
content-restricted item subsets which fit the overall Form 26A target 
information curve to produce six independent forms of the AAP Mathematics test 
(see Dealing with Item Subsets and Subtests ). That is, each of the six 
generated total test forms was created as a summation of the unique AAR, AAO, 
G, IA, NSS and AT subsets of items which "best" fit the overall Form 26A 
target curve. 

The generation of multiple forms during both stages of item selection was 
performed as a simultaneous operation. As described earlier, ITEMSEL 
automatically rotated all f \ indices as each item or item subtest was 
selected to ensure equalization of the item/subtest selection process across 
forms • 

Results 

In the present study, six forms of 40 itens each were generated by 
ITEMSEL using the 600 items in the math pool and the Mathematics test Form 26A 
target information values conditional on K- : 31 quadrature points of 6. In 
assessing the quality of the algorithms to fit the Form 26A target a number of 
considerations and comparisons are presented. 
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Summary of IRT Item Parameters 

The IRT item parameters (discrimination, difficulty and the lower 
asymptote) provide an important starting point in consideration of the item 
selection process. Assuming that the test target represents an ideal 
composite of items, we would expect that the items selected or fitted via the 
ITEMSEL program should demonstrate similar distributions of the item 
parameters to those present in the target specifications or test. 

A summary of the means and standard deviations of the item parameters is 
presented in Table 1. This table compares the distributional properties of 
the parameters for each of the six generated AAP Mathematics test Forms (A-F) 
with the Mathematics test Form 26A target parameters. In general, the 
apparent trend of the parameters suggests a very slight tendency (with one 
exception, Form F) by ITEMSEL toward overfitting the average item 
discrimination parameters (a) and toward choosing items with nominally higher 
mean difficulty parameters (b). 



Insert Table 1 about here 



The net result appears to be, therefore, a tendency for ITEMSEL to spread 
out the information (i.e. produce a more platykurtic distribution of 
information). Given the explicit averaging of the conditional information 
functions, via the S m algorithm, this minor distributional difference seems 
quite reasonable. It should also be noted that despite the minor 
distributional differences between the item parameters of the target test and 
those of the selected test forms, ITEMSEL was nonetheless very consistent in 
matching item parameters among Forms A through F of the test. 

17 



As an additional comparison, consider Table 2 which shows the means and 
standard deviations of the IRT parameters from 12 manually constructed 
Mathematics test Forms (i.e. actual forms prepared by ACT test development 
staff). Table 2 would appear to provide strong evidence of a greater degree 
of variation in the types of items which were manually selected across forms 
than was present in the computer-selected forms sumi;crized in Table 1. It 
should be noted, however, that these 12 manually-constructed test forms did 
not use target test information as the objective criteria* 



Insert Table 2 about here 



Goodness-of -Pi t 

In addition to the descriptive summary of the item parameters, we can 
also consider the test information curves, themselves* As shown in Figure 1, 
all six selected Mathematics Test Forms (A-F, 40 items each) demonstrated 
quite similar patterns of information. That similarity is perhaps even more 
evident in terms of the means and variances of the information curves (for 
which estimates of the expectations can be derived across the 31 quadrature 
points of G). For the target test, Form 26A, the mean information across the 
31 quadrature points of 8 was 21.67. Comparatively, the average of the 
expected means of the test information curves for the six selected test forms 
(A-F) was 21.54. Likewise, the approximate variance of the Form 26A target. 
information curve for 31 quadrature points was 152.53. This compares to an 
average variance of 161.55 for the Form A-F test information curves. 
Therefore, the general indication is that the information curves from the six 
selected test forms were essentially centered at the same point as the target 
curve, but with nominally larger variances. 
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Insert Figure 1 .about here 



Figure 2 presents the subsets of items selected by ITEMSEL to fit the 
individual content area subtargets (AAR, AAO, G, IA, NNS and AT). Some 
caution is warranted, however, when reviewing these content-specific graphs of 
the item subsets. The apparent differences in the curves across content areas 
must take into account the scaling of the ordinate axes. For example, the AT 
Forms appear to demonstrate a greater Lack of fit than the AAR Forms. 
However, if we consider the ordinate axes of the AT curves versus the AAR 
curves, it should be obvious that the real differences between the AT curves 
(2 items per subtest form) are actually as small or smaller than the 
differences between the AAR curves (14 items per subtest form). 



In judging the actual degree of fit between curves, a more useful set of 
goodness-of-f it indices (beyond visual inspection) seems needed. Table 3 
presents four such indices for the six AAP Math Forms fit to the Form 26A 
target information. 



The unweighted average absolute difference ( | UAD | ) represents the mean of 
the unsigned differences between the curves, as given by 



Insert Figure 2 -bout here 



Insert Table 3 about here 
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| UAD | = 




(14) 



K 
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The unweighted root mean square (URMS) index represents the square root of the 
mean squared deviations between the fitted and target curves along the 
quadrature points. That is, 



URMS = 



K . 2 

kl <VV 

K (15) 



The weighted mean square (WMS) is similar to the URMS, but uses a normalized 

weighting of the standardized true scores, given each quadrature point, to 

essentially scale the information differences to the expected score density of 

the 9 metric for the selected items. Therefore, the weighted mean square is 
given by 



K . 2 



WMS = I *U ) (T - T ) (16) 

k = l K K K 



where 



-c 2 



♦U, ) - e 



k 
2~" 



k K -r, 2 , (17) 



k-1 



ana 



I M\> ~ I I P;<e k ) / K 
j-1 J * k-1 j-1 ] * 

K J K J (18) 

k-1 j-1 J * k-1 j-1 - 

K(K-l) 



20 



18 



given P j( 8 ^ as thG Probability of a correct response to item j f conditional 
upon 9 . Finally, delta (A) is given as a squared difference weighted by the 
normalized information functions (densities) of the target function. 
Accordingly, delta becomes 



4 ■ X \ (t r - v 2 (19) 

k=l 



where 



W-r^- (20) 
I \ 

k-l K 



By themselves, the four goodness-of-f it indices provided in the upper 
half of Table 3 imply both weighted and unweighted functions of various forms 
of the average unsigned differences between the Form 26A target curve and the 
selected test information curves (i.e. the curves for Forms A-F). However, to 
put these indices in a different perspective, we might consider these indices 
as proportions of an information function, conditional upon some value of 9. 
To do so merely requires dividing the value of the indice in Table 3 by the 
information function at some point along the 8 metric (e.g. the mean 
information for the Form 26A target test of 21.67). For example, 
the |UAD|, URMS, WMS and A values (0.709, 0,849, 0.874 and 0.943) in the first 
row of Table 3 could be seen to represent proportional differences between the 
Form A curve and the target curve ranging from 3.28% to 4.36%, at the point of 
average test discrimination. These proportional differences, conditional upon 
the mean information in the Form 26A target curve, are provided in parentheses 
below each goodness-of-f it index in Tab a 3. The basic implication is that 
the fit between the information curves is actually far bettei than the indices 

O i 
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in the upper half of Table 3 might sujgest on the surface. That is, the 

apparent functional differences taken as relative ratios (proportions) to the 

amount of average information in the target curve (e.g. 3.1% to 5.0% in terms 
of j UAD | ) are essentially inconsequential. 

As another method of assessing the goodness-of-f it , we might consider the 

relationship between the test information and the standard error of the latent 
abilities, 6, given by 



°e(8) 



1 



I V 9) 
j-i J 

Using this relationship, it becomes possible to restate the goodness-of-f it 
statistics as weighted functions of the average unsigned diff rences between 
the standard errors conditional on G. These standard error differences are 
provided in the lower half of Table 3. 

The unweighted absolute average difference ( | UAD I g £r ( 9 ) ^ and *-he 
unweighted root mean square (URMSg^j) of the standard errors obviously 
appear larger than the weighted mean square ( WMS SE ( Q )) and delta ( A SE ( 9 )) 4 
The reason has to do with the larger standard errors on 8 at the asymptotes of 
the information curves. Because both the l UAD i S £( 9 ) and R ^ S SE(8) * ndices 
treat all quadrature points of 8 equally* both statistics essentially inflate 
the apparent unsigned average differences between the standard errors for the 
target versus fitted curves. R ^gE(8) father takes the square root which 
inflates the difference even more for values between 8 and U The ( WM s 3 jt; ( q ) ) 
and ( A S £(0)) indices, therefore, appear to be more meaningful in that both 
tend to limit the impact of standard error differences for 8 values near the 
asymptotes. This is especially true if we consider that the seemingly largest 
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differences between fitted forms and the target information functions occurred 
for Form F (referring to the upper half of Table 3). However , considering the 
weighted differences between the standard errors (lower half of Table 3) y the 
differences are negligible* 
Expected Score Differences 

The final determinants of the adequacy and accuracy in fitting a target 
test using the S m and S„ algorithms (as implemented in the ITEMSEL software) 
are the expected score distributions obtained from the various tests. That 
is, if we consider the issue of parallelism among test forms to extend beyond 
our objective function (test information), then we must also consider what the 
score distributions of the fitted test forms will loon like in comparison to 
the target test (AAP Math Form 26A, in this case). 

Figure 3 presents the test characteristics curves (TCCs) for each of the 
six fitted test forms along with the TCC for Form 26A. Taese TCCs are defined 
by the sum of the conditional probabilities for all items in a test across 
the 9 metric. That is, 

J 

T(8) = I P(8) (21) 

where ( 9 ) is the probability of a correct response to item j, conditioned 

upon 9 (see equation [13]). T(8) therefore defines the expectation of a 

random individual's true score on J items, given his/her ability level (Lord, 
1980). 



Insert Figure 3 about here 
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Quite cLiarLy, Figure 3 demonstrates a very close correspondence between 
true scores across the fitted forms of the AAP Math test and Form 26A. 
Additionally, the differences between predicted score distributions can be 
compared by converting the true scores to a discrete number-right scaLe. In 
the present study, predicted scores were obtained by assuming a (0,1) normal 
distribution on 8. TabLe 4 provides the means, standard deviations, Newness 
and kurtosis values of the predicted score distributions for the six AAP Math 
test forms fitted by ITEMSEL and the Form 26A target test. CLassicaL item p- 
values and biseriaL correlations and their standard deviations are aLso shown 
in TabLe 4. 



Insert TabLe 4 about here 



TabLe 4 provides fairLy cLear evidence of paralLeLism among the six fitted 
forms and the target test, not onLy in terms of predicted means and standard 
deviations, but aLso skewness and kurtosis. In other words, the process of 
fitting the target information was sufficient to fit the expected and 
predicted score distributions for the present item pooL. FinaLLy, as 
suggested by the mean p-vaLues and biseriaL correLations (and their standard 
deviations) the S m and S rm aLgorithms aLso seem to satisfy cLassical testing 
theory criteria for paraLLeLism* 
Microcomputer Timed Performance 

ITEMSEL was run on a Compaq 386/33 microcomputer for the present s-'.udy. 
As such, resuLting performance indicators are perhaps optimistic ones ior most 
microcomputer environments. ALso, due to the interactive nature of t * > 
fitting process, user skill greatLy enters into the assessment of timed 
performance* NonetheLess, severaL timing indices can be stated. 
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The entire process of constructing the six AAP forms, including all user 
inputs, fitting of subtargets and optimization of item content-designated 
subsets tc the overall target information curve ranged from 15 to 20 minutes 
in multiple trials. This compares to informal estimates made by ACT test 
development staff of about 170 hours to accomplish the same task manually 
(Noble, 1990). Of course, the 170 hours would also include formulating 
additional constraints and making qualitative judgements about the constructed 
forms beyond the test information fit criteria. 

In terms of more precise time estimates, the fitting of the six item 
subsets (six forms each) ranged from 1.2 to 10.9 seconds, depending upon the 
number of items. The process of choosing optimal subsets took 1.5 seconds of 
CPU. time. Comparatively, fitting six forms of the overall Form 26A target 
curve (without content breakdowns) used 70.7 seconds of CPU time on the same 
Compaq 386/33 microcomputer. It should be noted, however, that these timing 
values also include the generation of graphics displays during all selection 
stages. 

Discussion 

The S and S m algorithms were inLroduced as viable methods for fitting 
m rm 

test items to a target information curve. * Both algorithms use the criterion 
of a moving average of the conditional distance to the target function, across 
quadrature points of 9. Items are then selected by use of a weighted 
composite score which assesses their fit to the criterion. 

This approach appears to demonstrate three distinct benefits. First, the 
moving average criterion, as a form of an objective function, absorbs and 
redirects error in fit thus allowing for a non-iterative solution. The result 
is a reasonably fast method of fitting any target information curve. Second, 
the algorithms simultaneously consider all quadrature points which define the 



test information curves and upon which the information functions are 
conditional. That is, rhe entire information curve is aLways fit in the 
process of seLecting itoms or items subsets. FinaLLy, the algorithms can be 
conveniently extended for use with subtests/subtargets , item subsets and 
multiple test forms. 

In general. , ITEMSEL was able to produce six test forms which reasonably 
matched the Form 26A target test aLong muLtipLe LeveLs of criteria. For 
exampLe, IRT item parameters were shown f.o cLosely correspond to the 
parameters in the target test; more cLoseLy, in fact, than the parameters 
derived from existing, manuaLLy constructed forms of the Mathematics test. 
Other criteria denoting the fit of the seLected test forms to the target test 
(e.g., comparisons of the actuaL information curves) Likewise demonstrated a 
strong association between forms. 

The crucial point appears to be that ITEMSEL was abLe to successfully 
generate test forms with simiLar information curves. This was even shown to 
be the case when extending the notion of paralLelism to expected score 
distributions and cLassicaL item parameters. 

The process is, of course, far from perfect. Nonetheless, from an 
appLied viewpoint: (a) the method is fast (which makes it feasibLe for 
microcomputer technology, even for Large scale applications) and (b) it 
appears to be at Least as accurate as manuaL test construction methods given 
the constraints of this study. When implemented as part of an integrated 
software package such as ITEMSEL, these methods should readiLy complement the 
test construction process. This appLied viewpoint defines the finaL intent 
behind the methods described in this paper. 



24 



Author Notes 

^Partitioning the information CDF into equal areas essentially 
prioritizes the quadrature points of 6 relative to the conditional information 
densities. Accordingly, the concentration and spread of 8 corresponds closely 
to the actual distributional properties of the test information function. 
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Table 1 



BMSrfKiaJiaigjiLP^ .ten, r^er^^s^^i^^ 



Means Standard Deviations 



Target set of items 



o 

ERIC 



Form 


a 


b 


c 


a 


mdard De 
b 


viat ions 
c 


a 


AAP MATH 26A Vr 


1 .03 


0.29 


0.16 


0.40 


0.60 


0.04 


0.92 


Form A 


1 .03 


0.35 


0.17 


0.29 


0.52 


0.06 


0.87 


Form B 


1 .05 


0.35 


0.17 


0.29 


0.50 


0.06 


1 .03 


Form C 


1 .05 


0.31 


0.17 


0.30 


0.54 


0.05 


0.71 


Form D 


1 .05 


0.32 


0.16 


0.29 


0.55 


0.06 


1 .46 


Form E 


1 .04 


0.31 


0.17 


0.28 


0.50 


0. 06 


1 .25 


Form F 


1 .01 


0.32 


0.15 


0.29 


0.50 


0.05 


0.68 



Skewness 



Kurtosis 



k £ a b c 



-0.62 0.03 0.19 -0.47 l.l 9 

-0.20 0.52 0.96 -C.88 0.20 

-0.31 2.17 1.68 -0.79 9.65 

-0.12 0.25 1.09 -1.06 -0.25 

-0.11 0.81 2.56 -0.66 1.50 

"0.49 -0.32 1.88 -0.64 0.09 

0.13 -0.27 0.88 -0.94 0.27 



Table 2 



Means and Standard Deviations of IRT Paramete -s fo r 
12 AAP Math Forms (Manually-Constructed ) 

(N = 40 Items) 



Test Form a b c 



Form 24B 


1.058 


.309 


J .160 




(0.296) 


( .661) 


f (.084) 


Form 25B 


0.994 


0.395 


0.159 




(0.247) 


0.973) 


(0.079) 


Form 25C 


1.078 


0.359 


0.157 




(0.379) 


(0.744) 


(0.077 ) 


Form 25D 


1.068 


0.321 


0.142 




(0.353) 


(0 .830) 


(0.079) 


Form 25E 


1.057 


0.307 


0.128 




(0.259) 


(0.633) 


(0.055) 


Form 25F 


0.950 


0.385 


0.152 




(0.370) 


(0.863) 


0.062) 


Form 26B 


0.989 


0.240 


0.172 




(0.358) 


(0.875) 


(0.046) 


Form 26C 


0.930 


0.328 


0.162 




(0.365) 


(0.876) 


(0.034) 


Form 26D 


0.951 


0.392 


0.185 




(0.427) 


(1.283) 


(0.026) 


Form 26E 


0.972 


0.254 


0.166 




(0.297) 


(0.777) 


'0.048) 


Form 26F 


0.926 


0.342 


0.159 




(0.365) 


(0.953) 


(0.034) 


Form 27A 


0.990 


0.332 


0.178 




(0.394) 


(0.868) 


(0.046) 



( ) = Std. Deviation 

k) jl 
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Table 3 

Goodness-of-Fit Indices (to Form 26A Target) ^ 



Test 


Form 


Information Function 
|UAD| URMS 


Indices 
WMS 


A 


Form 


A 




u . OH 7 


0.874 


0.943 






\ u « U J J / 


V u . u j 7 y 


(0.040) 


(0.044) 


Form 


B 




u . / oo 


.637 


0.644 








V u . u JO / 


(0.029) 


(0.030) 


Form 


C 


0.816 


0.949 


1.018 


1.051 






(0.038) 


(0.044) 


(0.047) 


(0.049) 


Form 


D 


0.733 


0.889 


0.733 


0.688 






(0.034) 


(0.041) 


(0.034) 


(0.032) 


Form 


E 


0.670 


0.781 


0.655 


0.655 






(0.031) 


(0.036) 


(0.030) 


(0.030) 


Form 


F 


1.078 


1.276 


1.885 


1.944 






(0.050) 


(0.059) 


(0.087) 


(0.090) 



SE^g ^Indices 



Test Form 



|UAD SE(9)I URMS SE(0) WMS SE(0) 4 SE(6) 



Form A 


0.040 


0.159 


0.00b 


0.0005 


Form B 


0.053 


0.219 


0.011 


0.C010 


Form C 


0.039 


0.146 


0.004 


0.0005 


Form D 


0.041 


0.161 


0.005 


0.0006 


Form E 


0.039 


0.150 


0.005 


0.0005 


Form F 


0.031 


0.089 


0.002 


0.0003 



( ) Proportion of mean information in the Form 26A target curve (21.67). 

ERIC 3 2 
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TabLe 4 

P redi^ed Score Distributions for Six Fitted Test forms and Target Form 26A 



Test 

Form p S r L . S X S Skew Kuruosis 

r p bis r x 



26A 


.495 


.126 


.591 


.079 


19.825 


8.937 


.369 


-,812 


A 


.496 


.117 


.585 


.065 


19.840 


8.926 


.361 


-.808 


B 


.493 


.117 


.586 


.070 


19.734 


8.959 


.365 


-.844 


C 


.492 


.128 


.588 


.064 


19.686 


8.913 


.331 


-.826 


D 


.497 


.128 


.598 


.070 


19.874 


9.071 


.333 


-.845 


E 


.489 


.102 


.594 


.070 


19.551 


9.171 


.337 


-.878 


F 


.503 


.111 


.594 


.081 


20.139 


9.117 


.330 


-.846 



Target 
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Figure 1. Test Information Curves for Six Forms of AAP Mathematics 
Test Fit to Form 26A Target Information Curve 
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Figure 2. Sub-content Fitting to Form ?6A Content Items 
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Figure 3. TCCs for the Target Test Form 26A and 
Six Test Forms Fifed by ITEMSEL 
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